Blood metabolomic and transcriptomic signatures stratify patient subgroups in multiple sclerosis according to disease severity

Summary There are no blood-based biomarkers distinguishing patients with relapsing-remitting (RRMS) from secondary progressive multiple sclerosis (SPMS) although evidence supports metabolomic changes according to MS disease severity. Here machine learning analysis of serum metabolomic data stratified patients with RRMS from SPMS with high accuracy and a putative score was developed that stratified MS patient subsets. The top differentially expressed metabolites between SPMS versus patients with RRMS included lipids and fatty acids, metabolites enriched in pathways related to cellular respiration, notably, elevated lactate and glutamine (gluconeogenesis-related) and acetoacetate and bOHbutyrate (ketone bodies), and reduced alanine and pyruvate (glycolysis-related). Serum metabolomic changes were recapitulated in the whole blood transcriptome, whereby differentially expressed genes were also enriched in cellular respiration pathways in patients with SPMS. The final gene-metabolite interaction network demonstrated a potential metabolic shift from glycolysis toward increased gluconeogenesis and ketogenesis in SPMS, indicating metabolic stress which may trigger stress response pathways and subsequent neurodegeneration.


INTRODUCTION
Multiple sclerosis (MS) is an autoimmune disease with both inflammatory and neurodegenerative components, affecting $2.5 million people worldwide.Approximately 85% of patients with MS are diagnosed with the relapsing remitting form of the disease (RRMS), characterized by reversible or near reversible periods of neurological impairment (relapses).2][3] Most patients with RRMS eventually move to a more progressive disease phase associated with brain and spinal cord atrophy, where disability increases and relapses become less likely (secondary-progressive MS, SPMS). 1,4[3]5 It is now accepted that progression leading to significant, irreversible disability is due to both ''relapse-associated worsening'' and ''progression independent of relapse activity'' and is seen in almost all people with MS over time. 6,7lthough the pathogenesis of these two clinical phases of MS is different, the transition from RRMS to SPMS is a diagnostic challenge, there are no validated imaging or biofluid biomarkers that can distinguish between these two phases of MS, and as a result, SPMS diagnosis is made retrospectively following irreversible disability accrual.This has implications for appropriate treatment strategies, as there are many treatments approved for treating patients with RRMS but very few disease modifying therapies for SPMS have been approved.][10] Blood metabolites assessed using various nuclear magnetic resonance (NMR) and mass spectroscopy platforms can discriminate between people with MS compared to age and sex matched healthy donors 11,12 and other neurodegenerative diseases, 13 as well as between patients with SPMS from ll OPEN ACCESS RRMS. 14,15][18][19] Thus, a better understanding of the molecular landscape in the blood of patients with RRMS versus SPMS could facilitate a more personalized approach to treatment from the time of diagnosis, thereby delaying/reducing the accumulation of progressive disability.Biomarkers identifying/predicting disease severity could also help to assess the efficacy of new drugs and improve clinical trial design.This study combined analysis of serum metabolomics and whole blood transcriptomics in patients with RRMS compared to SPMS to build an integrated network describing these different phases of MS to improve understanding of the potential mechanisms driving disease progression.

Serum metabolites can stratify between multiple sclerosis patient subgroups and healthy and disease controls
Sera from patients with relapsing-remitting MS (RRMS, n = 52), secondary progressive MS (SPMS, n = 29), healthy donors (HCs, n = 80), and patients with neuromyelitis optica (disease controls -DCs, n = 30, an autoantibody-mediated disease that shares some symptoms and may be misdiagnosed as MS) were analyzed using an NMR serum metabolomic platform (Table S1 for participant demographics; Table S2 for list of metabolites).Metabolomic data were compared between groups using seven machine learning (ML) models (Figure S1 for the study plan). 20ll ML models discriminated between groups with high accuracy, with the ensemble model bagged logistic regression (LR) showing the best performance discriminating between patients with HCs and RRMS (accuracy: 0.871, area under the curve -receiver operator characteristic analysis (AUC ROC): 0.920) and the boosted LR model showing the best performance discriminating between patients with HCs and SPMS (accuracy: 0.945, AUC ROC: 0.910) (Tables 1 and S3).Similar stratification was observed between MS patient groups and DCs (Tables 1  and S4).
ML analysis of metabolomic data was also able to stratify between patients with SPMS vs. RRMS (Table 2).Overall, LR and random forest (RF) models performed best when predicting SPMS cases, correctly identifying 26 out of 29 (89.7%)patients (Figure 1A).While boosted LR performed better for RRMS cases, with excellent specificity (0.961) and an accuracy of 93.67% (Table 2).A significant separation between SPMS and RRMS patient groups was also observed using a sparse partial least squares-discriminant analysis (sPLS-DA) model (Figures S2A and S2B).
To derive a signature that could discriminate SPMS from patients with RRMS, features were compared across all analyses (Figure 1B).Thirty-six measures including metabolites, lipids, and clinical demographic information were featured in three models or more.The top features included glutamine (Gln) and the clinical features age and EDSS, which appeared in all seven models as well as in the sPLS-DA models (Figures 1B, S2A, and S2B).Linoleic acid (LA), cholesterol esters in medium low-density lipoprotein (M-LDL-CE), free cholesterol in small low-density lipoprotein (S-LDL-FC), acetoacetate, saturated fatty acid (SFA), omega-6 fatty acid (%),mono-and poly unsaturated fatty acids (MUFA/PUFA %) and free cholesterol in extra-large very low-density lipoprotein (XL-VLDL-FC %) were identified by six ML models and/or sPLS-DA (Figures 1B and S2C).Furthermore, glycolysis-associated metabolites (lactate, glucose, pyruvate, and citrate), amino acids (valine, tyrosine, histidine, and alanine), and sphingomyelin were featured in three ML models or more (Figure 1B).The full list of metabolites identified by a minimum of one ML model is listed in Table S5.
As expected, age and EDSS, known to be associated with severity and progression, were identified by all the models (Figure 1B).The influence of age and EDSS was further examined using the LR with interactions (LR + I) model, which was performed with and without age and EDSS features (Figures 1A; Table 2; Table S6).Although excluding age and EDSS decreased model accuracy (88.9%-80.3%)and AUC ROC (0.936-0.844), metabolites alone were still capable of classifying patients with SPMS and RRMS with good performance (Table 2).The ability of metabolomic features identified by three or more ML models (n = 29, excluding age and EDSS) to classify SPMS from patients with RRMS, was confirmed by sPLS-DA (Figure 1C; Figure S2D for loadings on components 1 and 2).AUC ROC curve analysis showed that the best performing ML model (Boosted LR, Table 2) outperformed the individual top ranked metabolites (identified by six models or more from Figure 1B) and patient features age and EDSS when discriminating between patients with SPMS vs. RRMS (Figure 1D).
To develop a robust method to stratify RRMS from patients with SPMS based on serum metabolomics, the optimum cut-off value was calculated for each serum metabolite identified in >4 ML models (Youden Index, 21 ; Table S7).Features with a Youden Index >0.5 were selected and used to build a points-based system for stratifying patient groups (Autoscore 22 ).The autoscore model (training dataset -70% of patients) identified the optimum cut-off values for the top five features (cholines, glutamine, saturated fatty acids, acetoacetate, and sphingomyelins) (Figure 1E for the importance ranking of the features; Table S8).The performance of the final model was confirmed in the test set (30% of patients) which stratified RRMS from SPMS with an AUC ROC of 0.9464 (Figure 1F).This analysis demonstrated that serum metabolites could be used to develop a reliable method to stratify patients, which could potentially out-perform clinical markers of severity such as EDSS and age.

Differential metabolite expression is mirrored in the whole blood transcriptome in patients with secondary progressive multiple sclerosis compared to relapsing-remitting multiple sclerosis
To evaluate whether changes in the serum metabolome were reflected more widely, whole blood RNA-sequencing was performed on a subset of patients.Differential expression analysis comparing MS patient groups identified 1052 differentially expressed genes (DEGs), which were able to cluster SPMS from patients with RRMS (Figure 3A).Of these, 948 genes were upregulated and 104 downregulated (Figure 3B).The top 20 up and down regulated DEGs are listed in Table S9.
Pathway enrichment analysis of up-and down-regulated genes in SPMS vs. RRMS also identified pathways associated with cell metabolism, including ''metabolism of RNA,'' ''cellular response to stress,'' ''metabolism of lipids,'' and ''cellular respiration'' (Figures 3C and  3D; Figures S3A-S3E; Table S10 for DEG lists).In total, 215 DEGs were associated with the above-mentioned metabolic pathways (Table S11).This includes the genes: nuclear respiratory factor 1 (NRF1), C-C chemokine receptor type 5 (CCR5), Glutamic-oxaloacetic transaminase 2 (GOT2), mitoregulin (LINC00116/MTLN) and O-sialoglycoprotein endopeptidase (OSGEP), which form part of the list of the top 20  up and downregulated genes (Table S9).Immune activation pathways including ''TNFA signaling'' and ''cytokine signaling in the immune system,'' and ''regulation of immune effector processes'' were also differentially enriched in patients with SPMS compared with RRMS (Figures 3D  and 3E; Figures S3A and S3F).Inflammatory mechanisms are known to be different between the two disease phases, as demonstrated by altered serum cytokine expression in RRMS compared to progressive MS and TNF-a-induced oligodendrocyte cell death, supporting demyelination in progressive disease. 26,27Interestingly, network analysis of the ''cellular respiration pathway'' revealed an association with ''Amyotrophic lateral sclerosis'' (Figure S3G), 28 supporting a role for dysregulated cell metabolism in the neurodegenerative processes in SPMS.Finally, many of the metabolic and immune-associated pathways enriched in patients with SPMS compared to RRMS were also identified when compared to an independent gene expression dataset comparing whole blood, lymphocytes, myeloid cells, and oligodendrocyte precursor cells from SPMS compared with patients with RRMS (Figure 3E). 17ogether, these results indicated a systemic alteration in metabolic pathways in patients with SPMS compared to patients with RRMS focused on RNA biosynthesis, lipid metabolism, and cellular respiration pathways (Figures 2A, 3C, 3D, and S3).

Gene-metabolite interaction network proposes a potential mechanism behind the metabolic switch in multiple sclerosis severity
The relationship between metabolites and DEGs in matched samples was further explored using correlation analysis.Multiple significant correlations (p < 0.01) were identified between metabolites and DEGs in the cellular respiration, regulation of immune effector response pathways, metabolism of lipids, and metabolism of RNA (Figures S4A-S4D).Most of these correlations were positive, and the large number of correlations suggest a coordination between metabolites and genes associated with cellular respiration and metabolism from a transcriptomic to metabolomic level.
To further integrate the metabolomic and transcriptomic data, network analysis was performed on all 1052 DEGs and the SPMS metabolomic signature.Within the final network, 26 upregulated and four downregulated genes were found to interact with eight metabolites (creatinine, citrate, pyruvate, lactate, phenylalanine, tyrosine, glycine, and linoleic acid) from the metabolomic signature (Figure 4; Table S12 for the function of genes within the gene-metabolite interaction network).Although phenylalanine was not identified by this analysis, it has been found to be deficient in patients with MS previously. 11While lipids could not be included within the network, the upregulated genes SCD5 (Stearoyl-CoA Desaturase 5), PLA2G12A (Phospholipase A2 Group XIIA), CARM1 (Coactivator Associated Arginine Methyltransferase 1) and SLC25A1 (Soluble Carrier Family 25 Member 1) were within the metabolism of lipid pathway.The downregulated gene GOT2 (log2FC: À0.617) is associated with multiple metabolic pathways, including amino acid metabolism, aminoacyl tRNA biosynthesis, metabolism of lipids, pyruvate metabolism, gluconeogenesis, and ketogenesis, and has been previously associated with MS. 29 Taken together, this gene-metabolite interaction network demonstrates the crosstalk between key genes and metabolites, which could highlight molecular processes associated with MS severity.The results suggest a potential metabolic switch from glycolysis toward increased gluconeogenesis and ketogenesis, supported by the fatty acid b-oxidation and conversion of ketogenic and gluconeogenic amino acids, phenylalanine, and tyrosine, in patients with SPMS compared to those with RRMS (Figure S5).

DISCUSSION
1][32] Furthermore, a putative score was developed that could be used for stratification between MS patient subsets, which if validated, supports the use of metabolite signatures for diagnosis, treatment decisions, and clinical trial design.More detailed analysis identified that the SPMS metabolomic profile was enriched for metabolites within the gluconeogenesis and ketogenesis pathways but reduced in glycolysisrelated metabolites, potentially reflecting cellular respiration changes in patients with SPMS compared to RRMS.This was corroborated using whole blood transcriptomic analysis, which identified genes enriched in the ''cellular response to stress,'' ''metabolism of lipids,'' and ''cellular respiration'' pathways in patients with SPMS compared to RRMS.Finally, multiple correlations between SPMS-associated metabolites and genes were identified, and an interaction network between key differentially expressed genes and metabolites highlighted glycolysis/gluconeogenesis, pyruvate metabolism, the TCA cycle, and aminoacyl-tRNA biosynthesis to be associated with MS severity.
MS can be difficult to differentiate from other neurological diseases, diagnosis is largely based on clinical symptoms and MRI results 33 and there are no validated blood-derived biomarkers for MS diagnosis or for the differentiation of RRMS from progressive disease 34 .Identifying proxy measures of the neuronal manifestations of MS beyond MRI imaging (which provides a retrospective indication of disease severity) and cerebral spinal fluid (CSF) sampling via lumber puncture, represents a clinical need. 35Several potential biomarkers assessing neuronal  S2 for metabolite abbreviations.(E and F) Metabolite features in (B) analyzed using the Youden Index to identify features that best stratify patients with SPMS vs. RRMS.Features with Youden J statistic >0.5 were analyzed using the Autoscore model.(E) Importance ranking of top variables (See Tables S7 and S8).(F) AUC-ROC plot showing performance of autoscore test set.
damage and inflammation and measurable in serum have been proposed in MS. 36,37 Serum levels of neurofilament light chains (NfL, released into the CSF following axonal damage and neuronal death) are highly correlated with CSF NfL measurements in matched samples, and longitudinal serum NfL measurements can be used to predict MS disease activity/progression, including new and enlarging T2 lesions and brain volume loss in individual patients. 34Other biomarkers present in both in serum and CSF that have been proposed for assessing MS disease prognosis include, glial fibrillary acidic protein (a marker of neuronal damage), and inflammation markers osteopontin, CXCL13, and CD163. 36n this study, supervised analysis of serum metabolomic data using ML models identified a metabolite signature that could stratify RRMS from patients with SPMS; healthy individuals from patients with patients with RRMS and SPMS, and neuromyelitis optica (similar to MS in both clinical and radiological presentation) from patients with RRMS and SPMS, all with high accuracy.This suggests that metabolomic signatures could be used to aid MS diagnosis and define specific disease endotypes that could aid patient stratification and personalized medicine.9][40][41] However, despite the many previous studies no unique metabolite signature has been defined, likely due to the use of different metabolomic platforms and difficultly in developing clinical metabolomic assays. 30,31he advantage of the work presented here is that an established, clinical grade, and standardized metabolomic platform was employed.This platform has been validated with a high correlation against both clinical metabolite standards and against standard detection methodology, namely clinical chemistry autoanalyzer measures. 42This platform has also been used to assess metabolomic signatures in several large  S5.Metabolites identified as associated with SPMS by > 1 model (Table S5) were analyzed by MSEA.biobanks including the UK biobank 42 and FINRISK biobanks 43 for biomarker research and translation.Our study uses this reproducible and cost effective metabolomic platform to compare patients with SPMS vs. RRMS to better define the heterogeneity of MS.Furthermore, using this platform could support future validation of the SPMS-associated metabolite signature in existing MS patient cohorts, providing an opportunity to identify and develop new diagnostic and prognostic serum biomarkers for SPMS as well as provide insight into systemic molecular therapeutic targets.
Our study also suggests that phenotyping disease stage in patients with MS could be improved by incorporating additional measures of key pathological processes identified in this study, such as changes in lipid metabolism and increased ketogenesis observed in SPMS.For example, ketone bodies including bOHbutyrate and acetoacetate were both increased in SPMS.Interestingly, acetoacetate alone (AUC ROC 0.87) outperformed EDSS (AUC ROC 0.70) and age (AUC ROC 0.83) when discriminating between patients with SPMS and RRMS.Similar studies report a systemic elevation of acetoacetate and bOHbutyrate in the CSF and blood from patients with MS compared to controls. 40,44,45Ketone bodies can cross the blood-brain barrier so may be produced in the brain and then eliminated into the blood where they can be detected. 46The strong association of ketone bodies with SPMS could suggest that monitoring serum levels may have the potential to pin-point if a patient is likely to progress before irreversible disability accrual.However, it remains to be validated whether ketone bodies, (e.g., acetoacetate) are systemic biomarkers that could add to the list of serum biomarkers for precision phenotyping in MS.Work is ongoing to relate serum and CSF measures of these metabolites.
The use of multi-omics (metabolomic and transcriptomic) can also shed light on the heterogeneous pathology associated with SPMS.Previous studies have used similar approaches, for example analysis of spatial transcriptomics and proteomics in fresh frozen brain tissue from (E) Validation of gene expression pathways using an independent dataset.Gene lists were taken from. 17This was performed on multiple cell types including, myeloid cells, lymphocytes, whole blood (myeloid and lymphocytes combined) and oligodendrocyte precursor cells.For consistency only genes with a log2FC of 0.585 and FDR<0.05 was considered.
patients with progressive MS identified molecular pathways associated with neurodegeneration that were not targeted by current therapeutic strategies. 47In this study, Kaufmann and colleagues identified that myelination is downregulated in oligodendrocyte precursor cells, a finding that may correlate with elevated serum sphingomyelin within our metabolomic signature. 47Integration of multi-omics data using unsupervised techniques (e.g., Multi-Omics Factor Analysis 48 and Data Integration Analysis for Biomarker Discovery using Latent components 49 ) on a range of omic datasets (metabolomics, bulk, single cell, and spatial transcriptomics, proteomics, and so forth) could also help to identify putative drivers of heterogeneity in SPMS. 50,51Applying this type of analysis across multiple tissues/cell types (blood, CSF, and brain tissues) to map pathogenesis and identify similarities in disease mechanisms/markers may well help to characterize and validate reliable biomarkers of severity to help clinicians with diagnosis (which is currently retrospective after a certain period of irreversible disability accrual has occurred) and implementing interventions. 50Furthermore, identifying blood-based biomarkers would help to avoid more invasive techniques for diagnosis and monitoring.
In addition, this type of granular interrogation of the molecular landscape in patients with MS could help to identify disease endotypes, patients who maybe similar clinically but may respond to treatments in different ways, for a more personalized medicine approach. 34For example, in depth analysis of blood-based immune phenotype and serum metabolomics has identified blood-based signatures associated with the future development of anti-drug antibodies to interferon-beta in patients with RRMS. 52,53This type of information could be used to guide treatment decisions as well as help patient selection to future clinical trials to improve outcomes. 34Finally, this multi-omic approach has the potential to identify new drug targets that can be validated in experimental models of disease as well as in other human datasets. 47,54t should be noted that omic data has some limitations.For example metabolomic data can be influenced by factors such as age, diet, 55 and hormonal status 56 and transcriptomic analysis does not always translate to changes in protein expression or activity due to RNA silencing and post-transcriptonal modifications. 57The analysis platform selected can also influence the reproducibility of the omic data obtained.For metablomics, NMR is more cost effective (meaning more samples can be analyzed), more reproducible, and requires minimal sample preparation compared to mass spectroscopy methods, although the number of molecules measured by NMR is more limited. 58For transcriptomic data, batch to batch variation can make some omic data difficult to replicate without appropriate controls.Furthermore, as mentioned above, the use of single omics only provides a partial insight into biological processes and does not reflect the molecular complexities of biological systems. 51Multi-omic analysis provides a better understanding of the biological system, however, omic integration requires matched samples, which are not always available, meaning that improved cohort and experimental planning is needed to fully leverage the advantages that omic analysis can provide. 51,54erum cholesterol (LDL and VLDL cholesterol subsets, M-LDL-CE %, S-LDL-FC, and XL-VLDL-FC%) and fatty acids (MUFA/PUFA %, Omega-6%, SFA, and LA) were important features differentiating directly between patients with SPMS and RRMS.There is growing evidence that altered levels of cholesterol and cholesterol derivatives in blood and CSF could be both implicated in the pathogenesis of MS, and be used clinically as biomarkers of disease activity, disease progression, and response to treatment. 59,60Elevated total and LDL cholesterol are associated with increased disease progression assessed by T2 lesion development (a measure of brain tissue damage) and the EDSS

Figure 4. Gene-metabolite interaction network
Related to Table S12.Gene-metabolite interaction network was constructed using the network analysis tool in MetaboAnalyst.This was performed by combining all 1052 DEGs and any metabolite that was identified by the machine learning models.The gene-metabolite interaction network illustrates the crosstalk between 27 upregulated (red circles) and 4 downregulated genes (blue circles) with 7 metabolites from the metabolomic signature (black squares).Genes are also colored based on the strength of their fold-change.Whilst phenylalanine was not identified as part of the metabolomic signature, it has been found to be lowered in MS, in previous publications.These genes and metabolites are associated with the metabolic pathways: aminoacyl-tRNA biosynthesis, glycolysis/gluconeogenesis, pyruvate metabolism and the TCA cycle.
score. 61,62VLDL-FC is also reported to be highly correlated with the EDSS disability score in patients with MS. 63 A recent metabolomic analysis using a PLS-DA model also identified lipoproteins as among the top features discriminating between patients with RRMS and SPMS, including HDL/LDL and VLDL which were all lower in SPMS. 41In experimental models, defects in cholesterol efflux to HDL can inhibit the remyelination processes in the brain and contribute to disability and progression. 64Elevated serum HDL is associated with lower rates of disease progression in MS. 59 We found that HDL subsets were largely elevated in SPMS compared to RRMS, although this does not necessarily reflect the function of HDL, which can be inflammatory. 65Overall, our findings largely correspond to previous reports, differences are likely due to the metabolomic platform used, whereby a more detailed and complex breakdown of lipoprotein subsets is examined in the current study. 42atty acid metabolism is also known to be disrupted in patients with MS in general, 66 with differences also reported between patients with RRMS and SPMS. 31A recent study showed that both plasma omega-3 and omega-6 fatty acids were associated with measures of disability in patients with MS, including MRI measures of brain volume and serum biomarkers NfL and glial fibrillary acidic protein. 67urthermore, patients with RRMS and progressive MS could be distinguished based on their distinct serum fatty acid derivative profiles, 68 supporting our findings that fatty acids are important features for discriminating between MS patient subgroups.Arachidonic acid is an omega-6 acid that can be metabolized into pro-inflammatory prostaglandins and leukotrienes which are known to be elevated in the brain of patients with MS. 69 Notably, patients with progressive MS had increased levels of arachidonic acid derivatives 67 and arachidonic acid derivatives correlated positively with EDSS scores, while omega-3 fatty acid derivatives had more variable associations with markers of disability and progression. 68Interestingly, LA is a PUFA that is metabolized to omega-3 fatty acids and arachidonic acid, and both LA concentration and the ratio of LA (LA %) among total fatty acids were among the top features differentiating between patients with SPMS and RRMS.However, while in this study LA concentration was increased suggesting the conversion of LA to omega-3 fatty acids may be impaired in SPMS, another study showed that gamma-LA was reduced in patients with progressive MS. 15 Overall, these results support a role for blood metabolites within the cholesterol and fatty acid metabolism pathways as reliable biomarkers for disease subtype stratification.
The dyslipidemia identified in our metabolomic analysis may indicate that patients with SPMS have increased cardiovascular disease (CVD) risk compared to patients with RRMS.This observation agrees with a study showing that patients with more severe or progressive disease are older and have more CVD events compared to patients with RRMS. 70Hypertension and heart disease are also associated with brain atrophy development in patients with MS. 71 Thus, lowering CVD-risk may also reduce the risk of increased brain atrophy and progression in patients with SPMS.Patients with SPMS may benefit from lifestyle changes that reduce CVD-risk including diet 72 or alternative strategies for lowering serum lipid levels and CVD-risk such as statins.Interestingly, high dose simvastatin, a CNS-penetrant statin, attenuates brain atrophy and disease progression in patients with SPMS 73 and a phase-III clinical trial testing the efficacy of simvastatin in SPMS is in progress (MS-STAT2; NCT03387670, http://www.isrctn.com/ISRCTN82598726).Of note, the mechanism of action of statins in SPMS may go beyond lipid lowering with wider effects on immune activation and signaling reported. 74athway and network analysis of metabolomic and transcriptomic data also identified potential molecular mechanisms associated with the pathogenic processes in patients with RRMS compared with SPMS.The metabolomic signatures suggested that ketone body and glutamate metabolism, and gluconeogenesis were enriched in patients with SPMS compared to RRMS, supporting a potential change in cellular respiration toward increased gluconeogenesis and ketogenesis and defects in glycolysis in SPMS compared with RRMS.This was supported by the whole blood transcriptomic analysis.DEGs were enriched in pathways associated with the metabolism of RNA, cellular respiration, and lipid metabolism as well as various immune-associated pathways.Several studies have examined the blood and/or brain tissue transcriptomic profiles in patients with RRMS and SPMS either compared with HCs or each other. 16,17,75,76One study compared PBMC microarray data from untreated patients with RRMS and SPMS 75 and similar to our findings, immune pathways and lipid and arachidonic acid metabolism were dysregulated between the two patient subgroups.Another microarray analysis of whole blood between patients with RRMS and primary progressive MS also identified differences in metabolic pathways including a down regulation of the oxidative phosphorylation pathway and an upregulation of RNA metabolism-associated pathways in progressive MS. 76 Interestingly, when we compared DEGs from whole blood (this study) with those obtained from SPMS brain tissue (oligodendrocyte precursor cells, monocytic cells, and lymphocytes), 17 several overlapping pathways were highlighted relevant to cellular responses to stress, metabolism of RNA, cellular amide metabolic process, and regulation of apoptotic signaling. 17This supports the systemic nature of MS disease pathology, whereby there is overlap between gene expression and metabolic pathways between whole blood and cells from brain tissue from patients with SPMS.
In summary, this study shows that serum metabolites could be attractive candidate biomarkers to identify patients with MS that have transitioned to a more severe/progressive disease.Currently, there are no validated imaging or biofluid biomarkers that can distinguish between patients with RRMS and SPMS and this remains a diagnostic challenge.Metabolites have both diagnostic and prognostic potential to improve SPMS diagnosis, which is currently made retrospectively following irreversible disability accrual.Finally, the results presented here could indicate a systemic alteration in metabolic pathways in patients with SPMS compared to patients with RRMS focused on RNA biosynthesis, lipid metabolism, and cellular respiration pathways that could provide insight into the pathogenic mechanisms associated with these two phases of the disease.

Limitations of the study
This study had some limitations including an unbalanced sample size between patients with RRMS compared to controls and patients with SPMS for the metabolomic analysis and a small sample size for the transcriptomic analysis.Despite this, the data reported here are supported The identified metabolomic signature was validated by generating a sPLS-DA (a supervised clustering ML approach that combines classification and parameter selection into one operation) with only metabolites that came up in three ML models of more.This analysis was performed using the R package mixOmics 77 using the metabolomic data for the following comparisons: DC vs. HC; RRMS vs. HC; SPMS vs. HC; SPMS vs. RRMS; RRMS vs. DC and SPMS vs. DC.
To visualise how well individual metabolites (identified by six ML models or more) could discriminate SPMS from RRMS, compared to the metabolomic signature, area under the curve -receiver operating characteristic (AUC-ROC) curves were generated and based on the best performing ML model.RStudio 4.2.0 (The R Foundation, Vienna, Austria)87 were used for ML analysis.

Logistic regression with/without interactions
Less important features were shrunk to zero, using the least absolute shrinkage and selection operator (lasso) method which uses the absolute value of the co-efficient as a penalty.Tuning the regularisation variable lambda (l), determines the strength of shrinkage.Lasso logistic regression (including with and without interactions) was conducted using the glmnet R package. 83All 111 metabolites were included.Categorical predictors were coded as dummy variables with the following treated as the reference class: smoking status-never smoked, sex-male and ethnicity-Caucasian/White. Age, baseline EDSS and BMI were treated as continuous variables.For the comparison RRMS vs. SPMS, an additional model where age and EDSS was excluded was generated for LR + I, to test whether metabolites alone can stratify RRMS and SPMS patients.The strength of shrinkage is determined by tuning the regularization variable lambda (l).Ln(l) was optimized using the R package and set to lambda = 0.059 and lambda = 0.065 for logistic regression with and without interactions respectively.

Ensemble methodology
To improve predictive performance of previous LR models, ensemble methods bootstrap aggregating (bagging) and boosting were utilised. 84Bagging involves training the model on each bootstrapped dataset.A bootstrapped dataset is made by randomly sampling patients from the original dataset with replacement i.e., a patient can appear more than once in the bootstrap dataset, however, the same number of patients must be used as the original dataset.This is then added to the ensemble, as an ''individual''.Every ''individual'' that is added to the model has been trained on the same number but a different combination of patients.The aggregation process then involves collating each ''individuals'' assessment and the features i.e., metabolites that were voted for by the most individuals becomes the ensembles overall assessment.For each bootstrap dataset that is generated, a side product known as an out-of-bag dataset (patients not used within the bootstrap dataset) is generated and can be used for validation.Boosting trains each model to give more weight to previously misclassified patients from previous models, until all patients have been correctly predicted.The parameters for bagging and boosting were set to B = 100 and B = 1000, respectively with 10-fold cross-validation to prevent model overfitting.The ensemble models were created using the R Package caret 85 and caTools packages. 86

Random forest
RF is a machine-learning algorithm that assigns observations into classes (HC/RRMS/SPMS/DC) by creating thousands of decision trees (predicts the value of target variables by learning simple decision rules inferred from the features i.e., metabolites from the dataset), or a ''forest'', and averaging the results.Only a small random sample of predictors are candidates for selection at each node, so the created trees are decorrelated.Importance was quantified by the Gini index, which represents the total variance across the two classes, quality of each split and the purity of each node.The values for mtry and ntree were tuned using the R Package caret. 85The parameters were set to mtry = 10, ntree = 10,000.

Support vector machine (SVM)
SVM is a supervised classification method which optimally separates data into two classes by creating a hyperplane.The radial basis function kernel was used, as this dataset was not linearly separable.Values for C, epsilon and gamma were tuned using the R Package 4.2.0. 99The parameters were set to C = 4.0, epsilon = 0.1, gamma = 0.01.

Neural network (NN)
NN uses the multi-layer perceptron algorithm with backpropagation which can learn both linear and non-linear models.NNs incorporate large numbers of processing nodes that are densely interconnected, similar to the human brain.These nodes are organised into layers-input, hidden and output.The output node/classification is affected by weights of values in hidden layers, these weights are adjusted when the model is trained for the best classification.

Model performance
10-fold cross-validation was used to evaluate model performance.The following performance metrics were calculated from the confusion matrices: (1) F1 score-a weighted average of precision (positive predictive value), recall (sensitivity), specificity-the true negative rate, classification accuracy (CA)-the proportion of correctly classified cases.

Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)
sPLS-DA is a supervised clustering ML approach that combines classification and parameter selection into one operation.This analysis was performed using the R package mixOmics 77 using the metabolomic data for the following comparisons: DC vs. HC; RRMS vs. HC; SPMS vs. HC; SPMS vs. RRMS; RRMS vs. DC and SPMS vs. DC.10-fold cross validation with 50 repetitions was applied to prevent model overfitting.sPLS-DA models with different component numbers were assessed by 10-fold cross-validation using the balanced error rate to evaluate model performance.For the above comparisons, the number of components was selected, based on the combination that gave the lowest overall estimation error rate.These were selected as optimal and could give the best discriminatory performance for further analysis.The separation of the comparisons was presented by projecting the samples into the subspace constructed by component 1 and component 2. The top weighted features were selected and presented through variable loading plots.sPLS-DA results were visualised using the R package ggplot2. 87

Metabolite scoring -Determining optimum cut points
Cut-point analysis was performed using the Youden index in conjunction with ROC analysis to determine optimum cut-offs or thresholds that can stratify RRMS from SPMS patients.This was done for each metabolite, identified by four ML models or more.The Youden index was calculated using the biomarker analysis tool in MetaboAnalyst 5.0. 21

Building a clinical model based on serum metabolite expression to discriminate between RRMS vs. SPMS patients
AutoScore is a machine learning-based clinical score generator consisting of 6 modules (variable ranking, variable transformation, score derivation, model selection, score fine-tuning and model evaluation) for developing interpretable point-based scores. 22Compared to complex models, point-based scores are more explainable and interpretable and can be easily implemented and validated in clinical practice.The dataset on patients was divided into non-overlapping training (70%: 57 patients) and testing (30%: 24 patients) datasets.The training set was used to develop the clinical scoring model and the testing set was used to evaluate the best score/cut-off.Markers with a Youden index above 50% was used within the analysis.These variables were ranked by random forest (parameter -ntree = 100) and a parsimony plot used to identify the optimum number of variables to use within the clinical model.The parsimony plot considers both the no. of variables in each model against the model performance (measured by [AUC]) and chooses the optimum number of variables by balancing model complexity with predictive ability.To build the clinical model and produce scores, AutoScore uses multivariable logistic regression.Continuous variables were categorised, and quantiles was used to determine cut-off values of the data points.These initial scores generated by AutoScore were fine-tuned by using the guidance of cut-offs calculated using Youden Index to improve model interpretability.This fine-tuned model and optimum threshold (97 points) for stratifying RRMS from SPMS patients was evaluated on the test dataset.

Metabolite set enrichment analysis
Metabolite Set Enrichment Analysis was performed on the metabolomic signature identified by ML models (i.e., all metabolites identified by at least one ML model or more), sPLS-DA and forest plots using the KEGG and SMPDB database within MetaboAnalyst 5.0. 21This generated a report on over representation analysis.Metabolic pathways that were considered had p values<0.05.

Protein-protein interaction network analysis
To visualise protein-protein interactions, network analysis was performed in Cytoscape on genes taken from upregulated pathways. 81Mapped onto genes were FCs and colour-coded charts of top sub-pathways these genes belonged to.Enrichment maps were also generated to visualise the connection, significance, and clustering between sub-pathways under these upregulated GO and Reactome terms.Publication enrichment was performed to identify PubMed papers that had genes similar to the gene set from the selected pathways. 81

Validation of gene expression pathways using independent datasets
To validate gene expression pathways, we used an independent study by Kihara et al., 17 who performed single nucleus RNA-seq on SPMS vs. RRMS patients.This was conducted on multiple cell types including, myeloid cells, lymphocytes, whole blood (myeloid and lymphocytes combined) and oligodendrocyte precursor cells.Multiple list enrichment (Metascape 79 ) was used to compare pathways enriched across our dataset and Kihara et al., 2018.Only genes that had an FC < À1.5 or >+1.5 and FDR<0.05 were considered in the comparison.

Circos plots
Correlations between biological pathways identified by pathway enrichment analysis and the metabolomic signature that discriminates SPMS from RRMS were visualised using correlation circle plots. 88Pearson's product moment correlation was calculated between the normalised count of pathway genes and metabolite levels. 89Only correlations with p < 0.01 or p < 0.05 were considered.

Figure 1 .
Figure 1.Serum metabolomics can stratify patients with SPMS vs. RRMS Related to Table2, FigureS2and Tables S5-S8.NMR serum metabolomics analysis in patients with SPMS (n = 29) and RRMS (n = 52).Metabolomic signatures associated with SPMS vs. RRMS determined using machine learning.(A) Confusion matrices showing number of correct (blue squares) and incorrect (green squares) classifications for each model.The sum (S) of each row and column is given.The models were: logistic regression (LR) with/without interactions (I) including/excluding age and EDSS (expanded disability status score), bagged and boosted LR, support vector machine (SVM), random forest (RF), neural network (NN) and sparse partial least square discriminant analysis (sPLS-DA).See FiguresS2A and S2B.(B) Comparison of metabolites selected by each machine learning model (black squares).Metabolites selected by > 6 models highlighted in red bold; metabolites selected by R 3 models in black bold.

Figure 1 .
Figure1.Continued (C) sPLS-DA plot (sparse partial least squares-discriminant analysis) to validate metabolomic signature in (B) in metabolites identified by R 3 models.See FigureS2Dfor features in components 1 and 2. (D) Area under the curve-Receiver operator characteristic (AUC-ROC) of top 14 metabolites and/or clinical features identified by R 6 models and the best performing machine learning model, boosted LR.See TableS2for metabolite abbreviations.(E and F) Metabolite features in (B) analyzed using the Youden Index to identify features that best stratify patients with SPMS vs. RRMS.Features with Youden J statistic >0.5 were analyzed using the Autoscore model.(E) Importance ranking of top variables (See TablesS7 and S8).(F) AUC-ROC plot showing performance of autoscore test set.

Figure 2 .
Figure 2. Metabolite set enrichment analysis (MSEA) on SPMS vs. RRMS metabolomic signature Related to TableS5.Metabolites identified as associated with SPMS by > 1 model (TableS5) were analyzed by MSEA.(A) Enrichment analysis using KEGG database.Pathways with a significant enrichment p < 0.05 are shown (-log10(p value)).Enrichment ratio (ER) is indicated along y axis.(B) Network of metabolic pathways derived from SMPDB database.Nodes are sized using the -log10(p value) and colored on a red to yellow gradient.The larger and redder the node, the greater the significance of the p value.(C-G) Bar charts showing the relative expression of metabolites (normalised values) in serum from patients with RRMS n = 52 (blue) vs. SPMS n = 29 (orange) (C) lactate and glutamine, representative metabolites of gluconeogenesis pathway; (D) acetoacetate and bOHbutyrate (ketogenesis pathway) (E) citrate (TCA); (F) alanine and pyruvate (glycolysis-related) and (G) total triglycerides, cholines and total fatty acids associated with lipid metabolism.T-tests were performed to identify statistically significant differences between patients with RRMS (blue) and patients with SPMS (red).p value, mean and +/À SD shown.
Figure 2. Metabolite set enrichment analysis (MSEA) on SPMS vs. RRMS metabolomic signature Related to TableS5.Metabolites identified as associated with SPMS by > 1 model (TableS5) were analyzed by MSEA.(A) Enrichment analysis using KEGG database.Pathways with a significant enrichment p < 0.05 are shown (-log10(p value)).Enrichment ratio (ER) is indicated along y axis.(B) Network of metabolic pathways derived from SMPDB database.Nodes are sized using the -log10(p value) and colored on a red to yellow gradient.The larger and redder the node, the greater the significance of the p value.(C-G) Bar charts showing the relative expression of metabolites (normalised values) in serum from patients with RRMS n = 52 (blue) vs. SPMS n = 29 (orange) (C) lactate and glutamine, representative metabolites of gluconeogenesis pathway; (D) acetoacetate and bOHbutyrate (ketogenesis pathway) (E) citrate (TCA); (F) alanine and pyruvate (glycolysis-related) and (G) total triglycerides, cholines and total fatty acids associated with lipid metabolism.T-tests were performed to identify statistically significant differences between patients with RRMS (blue) and patients with SPMS (red).p value, mean and +/À SD shown.

Figure 3 .
Figure 3. Metabolism of RNA, cellular responses to stress, immune effector response, cellular respiration, metabolism of lipids and tRNA processing pathways are dysregulated between patients with SPMS and RRMS Related to Figure S3 and Tables S9 and S10.RNA-sequencing was performed on n = 8 patients with SPMS and n = 5 patients with RRMS followed by differential gene expression and pathway enrichment analysis.Refer to Figures S3A-S3F.(A) Principal component analysis on all 1052 DEGs clustering patients with SPMS (orange) from patients with RRMS (blue).(B) Volcano plot showing differentially expressed genes (DEGs) Log2 fold change (>1.5 or < -1.5) and FDR-adjusted p value<0.05.Colored points represent significantly up-(red) and down-(blue) regulated genes in SPMS compared to RRMS.(C and D) Pathway enrichment analysis of DEGs analyzed by Metascape to identify regulated pathways.(C) Bar chart of top 20 significantly enriched pathways upregulated in patients with SPMS (Gene Ontology (GO), Reactome, Hallmark, Wikipathways).Pathways are ranked by p value.(D) Bar chart of top 14 significantly enriched pathways downregulated in patients with SPMS (Gene Ontology (GO), Reactome, Hallmark, Wikipathways).Pathways are ranked by p value.(E)Validation of gene expression pathways using an independent dataset.Gene lists were taken from.17This was performed on multiple cell types including, myeloid cells, lymphocytes, whole blood (myeloid and lymphocytes combined) and oligodendrocyte precursor cells.For consistency only genes with a log2FC of 0.585 and FDR<0.05 was considered.

Table 1 .
Metabolites distinguish between patients with RRMS and SPMS from healthy and disease controls with high accuracy Performance statistics for the top two performing models comparing RRMS, SPMS cohorts with healthy and disease controls.Ensemble methods gave best performance: bagged and boosted logistic regression (LR).Sensitivity (true positive rate); specificity (true negative rate).Statistics are rounded to 3 decimal places.HC, healthy controls; DC, disease control (neuromyelitis optica).See TableS3and S4 for list of top differentiating features.

Table 2 .
Comparison of predictive model performance of SPMS vs. RRMS